AITopics

Country: Asia > China (0.14)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Neural Information Processing SystemsMay-1-2026, 05:16:03 GMT

Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization

The promising zero-shot generalization of vision-language models such as CLIP has led to their adoption using prompt learning for numerous downstream tasks. Previous works have shown test-time prompt tuning using entropy minimization to adapt text prompts for unseen domains. While effective, this overlooks the key cause for performance degradation to unseen domains - distribution shift. In this work, we explicitly handle this problem by aligning the out-of-distribution (OOD) test sample statistics to those of the source data using prompt tuning. We use a single test sample to adapt multi-modal prompts at test time by minimizing the feature distribution shift to bridge the gap in the test domain. Evaluating against the domain generalization benchmark, our method improves zero-shot top1 accuracy beyond existing prompt-learning techniques, with a 3.08%improvement over the baseline MaPLe. In cross-dataset generalization with unseen categories across 10 datasets, our method improves consistently across all datasets compared to the existing state-of-the-art.

large language model, machine learning, natural language, (19 more...)

Country: Europe (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.82)

Neural Information Processing SystemsFeb-18-2026, 04:01:35 GMT

Align Y our Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization

TPT does not explicitly align the pre-trained CLIP to become aware of the test sample distribution. For the effective test-time adaptation of V -L foundation models, it is crucial to bridge the distribution gap between the pre-training dataset and the downstream evaluation set for high zero-shot generalization.

large language model, machine learning, natural language, (19 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Sweden > Östergötland County > Linköping (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.62)

arXiv.org Artificial IntelligenceNov-25-2025

The Alignment Paradox of Medical Large Language Models in Infertility Care: Decoupling Algorithmic Improvement from Clinical Decision-making Quality

Liu, Dou, Long, Ying, Zuoqiu, Sophia, Xie, Kaipeng, Yang, Runze, Liu, Di, Li, Kang, Lin, Yiting, Liu, Hanyi, Yin, Rong, Tang, Tian

Large language models (LLMs) are increasingly adopted in clinical decision support, yet aligning them with the multifaceted reasoning pathways of real-world medicine remains a major challenge. Using more than 8,000 infertility treatment records, we systematically evaluate four alignment strategies: Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), Group Relative Policy Optimization (GRPO), and In-Context Learning (ICL) through a dual-layer framework combining automatic benchmarks with blinded doctor-in-the-loop assessments. GRPO achieves the highest algorithmic accuracy across multiple decision layers, confirming the value of reinforcement-based optimization for structured prediction tasks. However, clinicians consistently prefer the SFT model, citing clearer reasoning processes (p = 0.035) and higher therapeutic feasibility (p = 0.019). In blinded pairwise comparisons, SFT attains the highest winning rate (51.2%), outperforming both GRPO (26.2%) and even physicians' original decisions (22.7%). These results reveal an alignment paradox: algorithmic improvements do not necessarily translate into higher clinical trust, and may diverge from human-centered preferences. Our findings highlight the need for alignment strategies that prioritize clinically interpretable and practically feasible reasoning, rather than solely optimizing decision-level accuracy.

grpo, large language model, machine learning, (19 more...)

2511.18084

Country: Asia > China (0.29)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

arXiv.org Artificial IntelligenceNov-18-2025

Multi-Metric Preference Alignment for Generative Speech Restoration

Zhang, Junan, Zhang, Xueyao, Yang, Jing, Wang, Yuancheng, Fan, Fan, Wu, Zhizheng

Recent generative models have significantly advanced speech restoration tasks, yet their training objectives often misalign with human perceptual preferences, resulting in suboptimal quality. While post-training alignment has proven effective in other generative domains like text and image generation, its application to generative speech restoration remains largely under-explored. This work investigates the challenges of applying preference-based post-training to this task, focusing on how to define a robust preference signal and curate high-quality data to avoid reward hacking. To address these challenges, we propose a multi-metric preference alignment strategy. We construct a new dataset, GenSR-Pref, comprising 80K preference pairs, where each chosen sample is unanimously favored by a complementary suite of metrics covering perceptual quality, signal fidelity, content consistency, and timbre preservation. This principled approach ensures a holistic preference signal. Applying Direct Preference Optimization (DPO) with our dataset, we observe consistent and significant performance gains across three diverse generative paradigms: autoregressive models (AR), masked generative models (MGM), and flow-matching models (FM) on various restoration benchmarks, in both objective and subjective evaluations. Ablation studies confirm the superiority of our multi-metric strategy over single-metric approaches in mitigating reward hacking. Furthermore, we demonstrate that our aligned models can serve as powerful ''data annotators'', generating high-quality pseudo-labels to serve as a supervision signal for traditional discriminative models in data-scarce scenarios like singing voice restoration. Demo Page:https://gensr-pref.github.io

arxiv preprint arxiv, large language model, machine learning, (19 more...)

2508.17229

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Carnemolla, Simone, Pennisi, Matteo, Russo, Chiara, Palazzo, Simone, Giordano, Daniela, Spampinato, Concetto

SeeingSounds: Learning Audio-to-Visual Alignment via Text

arXiv.org Artificial IntelligenceOct-15-2025

We introduce SeeingSounds, a lightweight and modular framework for audio-to-image generation that leverages the interplay between audio, language, and vision-without requiring any paired audio-visual data or training on visual generative models. Rather than treating audio as a substitute for text or relying solely on audio-to-text mappings, our method performs dual alignment: audio is projected into a semantic language space via a frozen language encoder, and, contextually grounded into the visual domain using a vision-language model. This approach, inspired by cognitive neuroscience, reflects the natural cross-modal associations observed in human perception. The model operates on frozen diffusion backbones and trains only lightweight adapters, enabling efficient and scalable learning. Moreover, it supports fine-grained and interpretable control through procedural text prompt generation, where audio transformations (e.g., volume or pitch shifts) translate into descriptive prompts (e.g., "a distant thunder") that guide visual outputs. Extensive experiments across standard benchmarks confirm that SeeingSounds outperforms existing methods in both zero-shot and supervised settings, establishing a new state of the art in controllable audio-to-visual generation.

large language model, machine learning, natural language, (18 more...)

doi: 10.1145/3743093.3770976

2510.11738

Country:

Europe (0.47)
Asia > Malaysia (0.16)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.54)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsOct-10-2025, 12:26:23 GMT

a76ed4a8ef522c823d73925e7fff16d4-Paper-Conference.pdf

arxiv preprint arxiv, hallucination, language model, (13 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Singapore (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceSep-17-2025

When Inverse Data Outperforms: Exploring the Pitfalls of Mixed Data in Multi-Stage Fine-Tuning

Deng, Mengyi, Li, Xin, Zhu, Tingyu, Yang, Zhicheng, Guo, Zhijiang, Wang, Wei

Existing work has shown that o1-level performance can be achieved with limited data distillation, but most existing methods focus on unidirectional supervised fine-tuning (SFT), overlooking the intricate interplay between diverse reasoning patterns. In this paper, we construct r1k, a high-quality reverse reasoning dataset derived by inverting 1,000 forward examples from s1k, and examine how SFT and Direct Preference Optimization (DPO) affect alignment under bidirectional reasoning objectives. SFT on r1k yields a 1.6%--6.8% accuracy improvement over s1k across evaluated benchmarks. However, naively mixing forward and reverse data during SFT weakens the directional distinction. Although DPO can partially recover this distinction, it also suppresses less preferred reasoning paths by shifting the probability mass toward irrelevant outputs. These findings suggest that mixed reasoning data introduce conflicting supervision signals, underscoring the need for robust and direction-aware alignment strategies.

large language model, machine learning, natural language, (18 more...)

2509.13079

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Azmat, Muneeza, Abbas, Momin, de Macedo, Maysa Malfiza Garcia, Grave, Marcelo Carpinette, de Souza, Luan Soares, Machado, Tiago, de Paula, Rogerio A, Horesh, Raya, Chen, Yixin, Candello, Heloisa Caroline de Souza Pereira, Nordenlow, Rebecka, Adebiyi, Aminat

A Comprehensive Evaluation framework of Alignment Techniques for LLMs

arXiv.org Artificial IntelligenceAug-15-2025

As Large Language Models (LLMs) become increasingly integrated into real-world applications, ensuring their outputs align with human values and safety standards has become critical. The field has developed diverse alignment approaches including traditional fine-tuning methods (RLHF, instruction tuning), post-hoc correction systems, and inference-time interventions, each with distinct advantages and limitations. However, the lack of unified evaluation frameworks makes it difficult to systematically compare these paradigms and guide deployment decisions. This paper introduces a multi-dimensional evaluation of alignment techniques for LLMs, a comprehensive evaluation framework that provides a systematic comparison across all major alignment paradigms. Our framework assesses methods along four key dimensions: alignment detection, alignment quality, computational efficiency, and robustness. Through experiments across diverse base models and alignment strategies, we demonstrate the utility of our framework in identifying strengths and limitations of current state-of-the-art models, providing valuable insights for future research directions.

large language model, machine learning, natural language, (17 more...)